YuPcre2 v1.2.0 for D7-XE10
YuPcre2 is a new regular expression library for Delphi with Perl syntax. It directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.
YuPcre2 provides two matching algorithms, the standard Perl and alternative DFA algorithm:
The Perl algorithm is what you are used to from Perl and JavaScript. It is fast and supports the complete pattern syntax. You will likely be using it most of the time.
DFA is a special purpose algorithm. If finds all possible matches and, in particular, it finds the longest. It never backtracks and supports partial matching better, in particular multi-segment matching of very long subject strings.
**YuPcre2 1.2.0 – 4 Mar 2016
**
New features:
New option to limit the length of a pattern: TDIRegEx2Base.MaxPatternLength and pcre2_set_max_pattern_length.
New option to limit the offset of unanchored matches: TDIRegEx2Base.OffsetLimit and pcre2_set_offset_limit.
New pcre2_substitute options PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNSET_EMPTY, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.Bug fixes:
In a character class such as
[\W\p{Any}]where both a negative-type escape (“not a word character”) and a property escape were present, the property escape was being ignored.Fixed integer overflow for patterns whose minimum matching length is very, very large.
The special sequences
[[:<:]]and[[:>:]]gave rise to incorrect compiling errors or other strange effects if compiled in UCP mode.Adding group information caching improves the speed of compiling when checking whether a group has a fixed length and/or could match an empty string, especially when recursion or subroutine calls are involved.
If
[:^ascii:]or[:^xdigit:]are present in a non-negated class, all characters with code points greater than 255 are in the class. When a Unicode property was also in the class (if PCRE2_UCP is set, escapes such as\ware turned into Unicode properties), wide characters were not correctly handled, and could fail to match. Negated classes such as[^[:^ascii:]\d]were also not working correctly in UCP mode.If PCRE2_AUTO_CALLOUT was set on a pattern that had a
(?#comment between an item and its qualifier (for example,A(?#comment)?B) pcre2_compile misbehaved.Similarly, if an isolated
\Ewas present between an item and its qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile misbehaved.The error for an invalid UTF pattern string always gave the code unit offset as zero instead of where the invalidity was found.
An empty
\Q\Esequence between an item and its qualifier caused pcre2_compile to misbehave when auto callouts were enabled.If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a
(*MARK)or other verb “name” ended with whitespace immediately before the closing parenthesis, pcre2_compile misbehaved. Example:(*:abc ), but only when both those options were set.In a number of places pcre2_compile was not handling
nilcharacters correctly.If a pattern that was compiled with PCRE2_EXTENDED started with white space or a #-type comment that was followed by
(?-x), which turns off PCRE2_EXTENDED, and there was no subsequent(?x)to turn it on again, pcre2_compile assumed that(?-x)applied to the whole pattern and consequently mis-compiled it. The fix for this bug means that a setting of any of the(?imsxU)options at the start of a pattern is no longer transferred to the options that are returned by PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have changed when the effects of those options were all moved to compile time.An escaped closing parenthesis in the “name” part of a
(*verb)when PCRE2_ALT_VERBNAMES was set caused pcre2_compile to malfunction.
